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We report on the status of the ILDG Middleware Working Group. 



1. INTRODUCTION 

The Middleware Working Group was formed 
with the aim of designing standard middleware to 
allow the interoperation of the data grids of ILDG 
member collaborations. Details of the working 
group are given in section [B] In this contribu- 
tion we outline the role of middleware in the 
ILDG, present our proposed middleware archi- 
tecture and discuss our current status and future 
work within the working group. 

1.1. What is Middleware 

Consider a lattice gauge theory practitioner in 
the US wishing to retrieve data from the data 
grid in the UK. He 1 is used to the way things 
work in the US. However, for historical reasons, 
the UK has developed its own drive on the left 
grid software before the advent of the ILDG. The 
US and UK systems are different. They use dif- 
ferent databases for the catalogues. They use 
different storage systems. How is the researcher 
to avoid the headache of re-learning everything 
he has painstakingly learned about the US data 
grid? 

The researcher interacts with applications 
(such as web browsers) which we will also refer 
to as clients. The actual databases holding the 
catalogues and the storage systems are called the 
back end. The middleware comprises of the layers 
of abstractions, interfaces, services and protocols 
between the applications and the back end. One 
piece of middleware for example would be a front 
end to a catalogue that an application could send 



1 for the sake of grammar only, we assume that the re- 
searcher is male 



queries to in a single standardised way, irrespec- 
tive of the actual back end database. Another 
piece of middleware may be an abstraction such 
as the concept of a logical name given to a piece 
of data. The logical name differs from a filename 
in the sense that it does not encode the location 
of the data. Hence applications dealing with log- 
ical names instead of file names can immediately 
work between different grids sharing the name- 
space of files. However, in order to retrieve the 
data, logical names still need to be resolved to 
actual file names. This can be done by sending 
the logical name to a service that can return the 
location details. The file can then be downloaded 
using a particular file transfer protocol, such as 
FTP, HTTP, or GSIFTP. 

1.2. Web Services 

Middleware can thus allow the interoperation 
of data grids given that the abstractions, pro- 
tocols, interfaces and services comprising it are 
standardised. How can the interfaces and services 
to be standardised? 

In the past, gateways to services were server 
programs which interacted with client programs 
through some messaging protocol. Custom and 
sometimes unportable messages were often used. 
Web services are modern versions of these server 
programs, with the difference that the defini- 
tions of the interfaces and the messaging pro- 
tocol have been standardised. Messages and in- 
terfaces are specified in Extensible Markup Lan- 
guage (XML) 1 , using the Simple Object Access 
Protocol (SOAP) |2] for messages and Web Ser- 
vices Description Language (WSDL) [3] for the 
interfaces. XML, SOAP and WSDL are industry 



1 



2 



B. Jod 



standards defined and maintained by the W3C 
consortium 4 . 

2. WEB SERVICE ARCHITECTURE 

2.1. Overview 

The ILDG middleware architecture is to be 
based on a collection of stateless web services. 
These are to provide a standardised interface to 
back end services such as a local storage system 
or the grid service layer of a non ILDG data grid. 
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Figure 1. The Middleware Web Service Architec- 
ture 



We illustrate the middleware architecture in 
figure ^ The large rectangles represent the grids 
belonging to hypothetical participating collabo- 
rations X and Y. These are split into the Mid- 
dleware and Locally Implemented Back End re- 
spectively. The ILDG Middleware web services 
appear in the Middleware half. We have drawn 
in the three main web services: 

• ILDG Metadata Catalogue (MDC), 

• ILDG Storage Resource Manager (SRM) 

• ILDG Replica Catalogue (RC) service. 

These web services interact with back end im- 
plementations. The back end implementations 
are specific to the participating grid. For exam- 
ple the back end storage system may be directly 



controlled by the ILDG SRM service for collabo- 
rations who use SRM to manage their data. Al- 
ternatively, the back end storage system may be 
some collaboration specific non-SRM implemen- 
tation, in which case the ILDG SRM web service 
acts as an "interoperability layer" allowing ILDG 
compliant applications to treat the back end stor- 
age as if it was an SRM. What is important from 
the point of view of interoperability is that the 
applications see a standard web service interface. 

Figure ^ also shows possible data transfers be- 
tween the application, the web services and the 
back end. The SOAP messages exchanged be- 
tween the application and the web services are 
shown as dashed-dotted lines. The solid line be- 
tween the application and the back end storage 
system illustrates the idea of third party trans- 
fers, where the storage system can initiate data 
upload/download between itself and the applica- 
tion directly. We also show the hypothetical case 
of the ILDG MDC returning the result of a meta- 
data query, and how this can be rendered from 
its XML form into a human readable web page by 
passing it through an HTML rendering servlet. 

Finally we illustrate in figure ^ a possible way 
for applications to learn of the existence and 
whereabouts of the ILDG Web Services of partici- 
pating collaborations. This may be done through 
the use of directory services. An application can 
consult, say, the MDC Directory service to dis- 
cover how to contact one or more ILDG MDC 
services. It is not yet clear who will operate the 
directories. Each collaboration may operate one, 
or perhaps the ILDG could maintain one or more 
global instances of these services, details of which 
may be made public on the ILDG web pages. 

At this point we should highlight that the mid- 
dleware working group regards its primary role 
as that of standardising the web services in the 
architecture just presented. The working group 
does not feel responsible for providing the appli- 
cations or the back end services described. 

2.2. Naming the Data 

It is envisaged that each item of data in the 
ILDG will be identified by a global logical file- 
name (GFN). The space of GFNs will be par- 
titioned between participating collaborations to 
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avoid name clashes. The individual collabora- 
tions will then be responsible for managing their 
own allocated name spaces. 

Since the data item may be replicated a GFN 
may correspond to several copies of the data. The 
mapping between GFNs and individual files is 
maintained by the RC. We will refer to these indi- 
vidual filenames as site universal resource locators 
(SURLs). SURLs may be presented to the SRM 
service in order to retrieve the actual files. 

2.3. Metadata Catalogue 

The ILDG MDC service has the task of allow- 
ing the standardised interrogation of the MDCs 
of participating collaborations. Applications can 
present metadata queries to the ILDG MDC. 
The replies can be either GFNs or they can be 
full or partial metadata instance documents that 
are selected by the query. This gives the ILDG 
MDC read only semantics. It is envisaged that 
the ILDG MDC will interrogate a locally im- 
plemented MDC which allows for maintenance 
of the local catalogue. In other words: inser- 
tion/deletion of metadata is expected to be han- 
dled outside of the ILDG framework by the par- 
ticipating collaborations. 

2.4. Replica Catalogue 

The ILDG RC service has to track various ex- 
isting copies of a given file. Essentially, it per- 
forms a mapping from a GFN to one or more 
SURLs. In order to allow files to migrate it may 
be necessary for the back end services or applica- 
tions to create and remove entries from the cat- 
alogue. Further, we expect that as files migrate, 
some of the SURLs returned from the replica cat- 
alogue may become invalid. The burden of deal- 
ing with this complexity is pushed onto the ap- 
plications. Prototype RCs exist at the Jefferson 
Laboratory 2 (JLab) and at Fermilab 3 . 

2.5. The SRM 

The Storage Resource Manager (SRM) has the 
task of managing the storage system within a col- 
laborating data grid. SRM is actually a sophis- 
ticated storage resource management system de- 

2 in full: Thomas Jefferson National Accelerator Facility 
3 in full: Fermi National Accelerator Laboratory 



veloped between a variety of institutions. The de- 
sign of the SRM is lead by the Storage Resource 
Management Working Group which is soon ex- 
pected to become a Global Grid Forum Grid Stor- 
age Management Working Group 0. 

At the time of writing, version 2.1 of SRM has 
been defined and the WSDL definition has been 
made available online The JLab has an im- 
plementation of the SRM 2.1 specification, which 
was completed in the middle of summer 2004. 

The chief envisioned functionality of the SRM 
in the ILDG, which may be much more lim- 
ited than its complete functionality, is to allow 
the downloading of files. On the presentation of 
an SURL, the SRM identifies the individual file 
server which holds the file and returns a transfer 
URL (TURL) to the file. This is a URL which 
can be used to download the actual file, for ex- 
ample by using the wget utility with the TURL if 
the download is to proceed via the HTTP trans- 
fer protocol, or the globus-url-copy utility if the 
transfer is to proceed through GSIFTP. 

3. FUTURE WORK 

We have presented a high level overview of the 
middleware architecture as it is currently envis- 
aged. Many details still require discussion, in par- 
ticular the detailed WSDL definition of the inter- 
faces of all the components remains to be com- 
pleted. This is particularly true for the case of 
the MDC and the RC. Work on the MDC is un- 
derway at the CCS at Tsukuba, and at Fermilab. 
A prototype RC implementation is nearly ready 
at the JLab. 

The UKQCD collaboration will attempt to im- 
plement an SRM compatibility layer in order to 
allow the transfer of data between its UK QCD- 
Grid and a standard SRM such as the one at the 
JLab. 

The interaction of MDCs between grids still 
needs to be clarified. One suggestion has been 
to produce a specification which is recursive. In 
this case one can interact with a root MDC, which 
transparently queries lower level MDCs it knows 
about. Another suggestion which has already 
been alluded to is the use of directory services. 

An interesting unresolved issue is how to per- 
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form metadata queries. The metadata working 
group has proposed a definition of QCDML which 
assumes a hierarchical XML data model. How- 
ever hierarchical and XML databases are not as 
mature and well known as relational ones, and it 
may be desirable to hold the metadata in a rela- 
tional database at least at the lowest level. The 
question is whether to present an XML (XQuery) 
or a relational (SQL) view of the databases for the 
applications to query, or perhaps to provide both. 
If the SQL view is to be provided, it may be use- 
ful to define a relational table based view of the 
QCDML schema in order to allow straightforward 
use of SQL back end databases. 

4. FILE FORMATS AND PACKAGING 

So far we have neglected issues pertaining to 
the format and packaging of data. The middle- 
ware working group is currently in discussion with 
the metadata working group on these issues. 

The key question is whether the ILDG should 
or should not mandate a standard gauge con- 
figuration file format. Should a file format be 
mandated, the question arises as to how to pro- 
vide tools so that collaborations can transform 
the data between the standard format and their 
own formats in a straightforward manner. It is 
also possible to not mandate a standard format, 
but to provide a standard way to describe the for- 
mat of the stored configurations from which tools 
can be generated to effect the transformation be- 
tween formats. 

One option for the former case is to maintain 
program code as part of the metadata whereas 
in the latter case the binary data layout may be 
completely specified by BinX markup [H]- The 
BinX project also provides a software library that 
can be used to perform transformations on the 
data such as a rearrangement of indices, reversal 
of bit and byte order and the selection of vari- 
ous slices, making it straightforward to write pro- 
grams to transform the data in any desired way. 

5. CONCLUSIONS 

Currently, the main limitation of the mid- 
dleware working group is a lack of manpower. 



Nonetheless, progress is being made albeit slowly. 
The working group plans to hold a working meet- 
ing to be attended by the conveners and some 
implementers to finish the definitions of the web 
services described here. 

6. ABOUT THE WORKING GROUP 

The working group has three joint conveners: 

• William Watson from the Thomas Jefferson 
National Accelerator Facility (JLab), New- 
port News, VA, USA, 

• Mitsuhisa Sato from the Centre for Compu- 
tational Sciences, Tsukuba, Japan, 

• Bdlint Joo from the School of Physics, Uni- 
versity of Edinburgh, Edinburgh, UK. 

We would also like to highlight the participation 
of Eric. H. Neilsen from Fermilab, Batavia, IL, 
USA, who, while not a convener, has been an 
extremely active member of the working group 
contributing a lot of material including analysis, 
use cases and samples of WSDL that have greatly 
aided the progress of the working group. 

Communication in the working group has hith- 
erto proceeded by email and through the public 
mailing list [7] which is archived at [5]. Further 
information about the working group is also avail- 
able through the ILDG Web Site 0. 
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